Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unnecessary flushing during unary response processing #1373

Conversation

petermattis
Copy link
Contributor

This reduces the number of packets sent when responding to a unary RPC
from 3 to 1. This reduction in packets improves latency, presumably by
the lower syscall and packet overhead.

See #1256

This reduces the number of packets sent when responding to a unary RPC
from 3 to 1. This reduction in packets improves latency, presumably by
the lower syscall and packet overhead.

See grpc#1256
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the optimization! Note that @MakMukhi is also working on a slightly different change to make the batching work correctly (there's a subtle bug preventing it for now). I have one minor nit/discussion point in the detailed comments.

@@ -800,7 +800,7 @@ func (s *Server) processUnaryRPC(t transport.ServerTransport, stream *transport.
}
opts := &transport.Options{
Last: true,
Delay: false,
Delay: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO "Delay" here should be interpreted by sendResponse as applying to the response as a whole -- header, payload, and trailer -- not each piece independently.

So, I believe this actually should still be "false", and it shouldn't be factored into the logic in sendResponse until the final step (where looking for other writers before flushing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It seems like the calls to adjustNumWriter and the associated flushing are at too low a level. For unary RPCs, the number of writers should be incremented before calling sendResponse and decremented after the WriteStatus calls. The flushing logic currently in sendResponse is a bit difficult to reason about given the interaction with flow control and it isn't obvious to me how to pull it to a higher level. Perhaps this is the subtle bug you're referring to.

I'm happy to leave this work in your hands. Consider this PR more an indication of the importance of this work. The unnecessary flushes introduce a significant amount of latency!

@@ -893,11 +898,11 @@ func (t *http2Server) Write(s *Stream, data []byte, opts *Options) (err error) {
// Reset ping strikes when sending data since this might cause
// the peer to send ping.
atomic.StoreUint32(&t.resetPingStrikes, 1)
if err := t.framer.writeData(forceFlush, s.id, false, p); err != nil {
if err := t.framer.writeData(forceFlush && !opts.Delay, s.id, false, p); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be wrong, but for the scenario in which the client has an inbound flow control which is less than the size of the server's response message, IIC this change could cause deadlock, with the server running out of flow control and the client not having yet received anything.

This looks similar to the change in #973, which ran into that issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apolcyn I agree that this could lead to deadlock. Perhaps it would be sufficient to flush the buffered data whenever Write needs to loop. Also, I'm curious what the forceFlush logic is doing given that there is another check a few lines below to flush the data if we were the last writer.

@MakMukhi
Copy link
Contributor

Hey,
Thanks for the idea, I'm already working on something similar (as you noticed in the issue you linked earlier). Would it be ok if I took over this effort along with the other thing?
Thanks.

@petermattis
Copy link
Contributor Author

@MakMukhi Yep, go for it.

@MakMukhi
Copy link
Contributor

Cool, closing this PR for now then. Will have something out possibly by end of next week.

@MakMukhi MakMukhi closed this Jul 21, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants